我们报告了激进的量化策略,这些策略极大地加速了复发性神经网络传感器(RNN-T)的推理。我们使用4位整数表示进行权重和激活,并应用量化意识训练(QAT)来重新训练完整模型(声学编码器和语言模型)并实现近乎ISO的准确性。我们表明,根据网络本地属性量身定制的自定义量化方案对于在限制QAT的计算开销的同时,至关重要。密度比语言模型融合已显示出在RNN-T工作负载上的准确性提高,但严重增加了推理的计算成本。我们表明,我们的量化策略可以使用大型宽度宽度进行假设搜索,同时实现与流媒体兼容的运行时间,并且与完整的Precision模型相比,我们可以实现与流相兼容的运行时间和7.6 $ \ times $的完整模型压缩比。通过硬件仿真,我们估计端到端量化的RNN-T(包括LM Fusion)的3.4 $ \ times $从fp16到INT4,导致实时因子(RTF)为0.06。在NIST HUB5 2000,HUB5 2001和RT-03测试集中,我们保留了与LM Fusion相关的大部分收益,将平均WER提高了$ 1.5%。
translated by 谷歌翻译
The application of deep learning algorithms to financial data is difficult due to heavy non-stationarities which can lead to over-fitted models that underperform under regime changes. Using the Numerai tournament data set as a motivating example, we propose a machine learning pipeline for trading market-neutral stock portfolios based on tabular data which is robust under changes in market conditions. We evaluate various machine-learning models, including Gradient Boosting Decision Trees (GBDTs) and Neural Networks with and without simple feature engineering, as the building blocks for the pipeline. We find that GBDT models with dropout display high performance, robustness and generalisability with relatively low complexity and reduced computational cost. We then show that online learning techniques can be used in post-prediction processing to enhance the results. In particular, dynamic feature neutralisation, an efficient procedure that requires no retraining of models and can be applied post-prediction to any machine learning model, improves robustness by reducing drawdown in volatile market conditions. Furthermore, we demonstrate that the creation of model ensembles through dynamic model selection based on recent model performance leads to improved performance over baseline by improving the Sharpe and Calmar ratios. We also evaluate the robustness of our pipeline across different data splits and random seeds with good reproducibility of results.
translated by 谷歌翻译
Diffusion Probabilistic Models (DPMs) have recently been employed for image deblurring. DPMs are trained via a stochastic denoising process that maps Gaussian noise to the high-quality image, conditioned on the concatenated blurry input. Despite their high-quality generated samples, image-conditioned Diffusion Probabilistic Models (icDPM) rely on synthetic pairwise training data (in-domain), with potentially unclear robustness towards real-world unseen images (out-of-domain). In this work, we investigate the generalization ability of icDPMs in deblurring, and propose a simple but effective guidance to significantly alleviate artifacts, and improve the out-of-distribution performance. Particularly, we propose to first extract a multiscale domain-generalizable representation from the input image that removes domain-specific information while preserving the underlying image structure. The representation is then added into the feature maps of the conditional diffusion model as an extra guidance that helps improving the generalization. To benchmark, we focus on out-of-distribution performance by applying a single-dataset trained model to three external and diverse test sets. The effectiveness of the proposed formulation is demonstrated by improvements over the standard icDPM, as well as state-of-the-art performance on perceptual quality and competitive distortion metrics compared to existing methods.
translated by 谷歌翻译
In peer review systems, reviewers are often asked to evaluate various features of submissions, such as technical quality or novelty. A score is given to each of the predefined features and based on these the reviewer has to provide an overall quantitative recommendation. However, reviewers differ in how much they value different features. It may be assumed that each reviewer has her own mapping from a set of criteria scores (score vectors) to a recommendation, and that different reviewers have different mappings in mind. Recently, Noothigattu, Shah and Procaccia introduced a novel framework for obtaining an aggregated mapping by means of Empirical Risk Minimization based on $L(p,q)$ loss functions, and studied its axiomatic properties in the sense of social choice theory. We provide a body of new results about this framework. On the one hand we study a trade-off between strategy-proofness and the ability of the method to properly capture agreements of the majority of reviewers. On the other hand, we show that dropping a certain unrealistic assumption makes the previously reported results to be no longer valid. Moreover, in the general case, strategy-proofness fails dramatically in the sense that a reviewer is able to make significant changes to the solution in her favor by arbitrarily small changes to their true beliefs. In particular, no approximate version of strategy-proofness is possible in this general setting since the method is not even continuous w.r.t. the data. Finally we propose a modified aggregation algorithm which is continuous and show that it has good axiomatic properties.
translated by 谷歌翻译
Robust 2004是一种信息检索基准,其每个查询的大量判断使其成为可靠的评估数据集。在本文中,我们介绍了Mrobust04,这是一种多语言版本的robust04,使用Google Translate翻译为8种语言。我们还提供了该数据集上三个不同多语言检索器的结果。该数据集可在https://huggingface.co/datasets/unicamp-dl/mrobust上获得
translated by 谷歌翻译
估计空间变化的干预对空间变化结果的因果影响可能会受到非本地混杂(NLC)的影响,这种现象可能会估计给定单位的处理和结果部分由协方差估计。附近的其他单元。特别是,NLC是评估环境政策和气候事件对健康相关结果(例如空气污染暴露)的影响的挑战。本文首先使用潜在结果框架对NLC进行正式化,从而与因果干扰的相关现象进行了比较。然后,它提出了一个称为“ weather2vec”的广泛适用框架,该框架使用平衡分数理论来学习非本地信息的表示形式,以定义为每个观察单元定义的标量或向量使用因果推理方法。该框架在一项仿真研究和两项关于空气污染的案例研究中进行了评估,天气是(本质上是区域)已知的混杂因素。
translated by 谷歌翻译
我们定义了更广泛的腐败过程,该过程概括了先前已知的扩散模型。为了扭转这些一般的扩散,我们提出了一个称为“软得分匹配”的新目标,可以证明可以学习任何线性腐败过程的得分功能,并为Celeba提供最先进的结果。软得分匹配结合了网络中的降解过程,并训练模型以预测腐败与扩散观察相匹配的干净图像。我们表明,我们的目标在适当的规律性条件下为腐败过程的家庭学习了可能性的梯度。我们进一步开发了一种原则性的方法,以选择一般扩散过程的损坏水平和一种我们称为动量采样器的新型抽样方法。我们评估了我们的框架,腐败是高斯模糊和低幅度添加噪声。我们的方法在Celeba-64上获得了最先进的FID得分$ 1.85 $,表现优于所有以前的线性扩散模型。与香草deno的扩散相比,我们还显示出显着的计算益处。
translated by 谷歌翻译
我们研究了基于功能的新闻企业问题,其中决策者可以访问包括需求观察和外源特征组成的历史数据。在这种情况下,我们研究了功能选择,旨在得出具有改进样本外部性能的稀疏,可解释的模型。到目前为止,最新的方法利用正则化,这会惩罚所选特征的数量或解决方案向量的规范。作为替代方案,我们介绍了一种新型的双层编程公式。高级问题选择了一部分功能,这些功能将基于固定验证集的订购决策的样本外成本估算最小化。下层问题仅使用上层选择的功能,了解训练集中决策功能的最佳系数。我们为Bilevel程序提供了混合整数线性程序重新制定,可以通过标准优化求解器求解为最佳性。我们的计算实验表明,该方法准确地恢复了几百个观察结果的实例中的基础真相。相反,基于正则化的技术通常在功能恢复时失败,或者需要数千个观察值才能获得相似的准确性。关于样本外的概括,我们实现了改进或可比的成本绩效。
translated by 谷歌翻译
我们引入了三种算法,将模拟重力数据倒入3D地下岩石/流属性。第一种算法是一种基于数据驱动的,基于深度学习的方法,第二个算法将深度学习方法与物理建模混合到单个工作流程中,第三个考虑了表面重力监测的时间依赖性。这些提出的算法的目标应用是地下CO $ _2 $李子作为监视CO $ _2 $固存部部署的补充工具的预测。每种提出的算法的表现都优于传统的反转方法,并在几乎实时实时产生高分辨率的3D地下重建。我们提出的方法以$ \ mu $ gals的形式获得了预测的羽状几何形状和接近完美数据失误的骰子得分。这些结果表明,将4D表面重力监测与深度学习技术相结合代表了一种低成本,快速和非侵入性的方法,用于监测CO $ _2 $存储站点。
translated by 谷歌翻译
背景:基于AI的足够大型,精心策划的医疗数据集的分析已被证明有望提供早期检测,更快的诊断,更好的决策和更有效的治疗方法。但是,从多种来源获得的如此高度机密且非常敏感的医疗数据通常受到高度限制,因为不当使用,不安全的存储,数据泄漏或滥用可能侵犯了一个人的隐私。在这项工作中,我们将联合学习范式应用于异质的,孤立的高清心电图集,该图从12铅的ECG传感器阵列到达来训练AI模型。与在中心位置收集相同的数据时,我们评估了所得模型的能力,与经过训练的最新模型相比,获得了等效性能。方法:我们提出了一种基于联合学习范式训练AI模型的隐私方法,以培训AI模型,以实现异质,分布式,数据集。该方法应用于基于梯度增强,卷积神经网络和具有长期短期记忆的复发神经网络的广泛机器学习技术。这些模型在一个心电图数据集上进行了培训,该数据集包含从六名地理分开和异质来源的43,059名患者收集的12个铅录音。研究结果:用于检测心血管异常的AI模型的结果集获得了与使用集中学习方法训练的模型相当的预测性能。解释:计算参数的方法在本地为全局模型做出了贡献,然后仅交换此类参数,而不是ML中的整个敏感数据,这有助于保留医疗数据隐私。
translated by 谷歌翻译